{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dewey Defeats Truman\n",
    "\n",
    "In the 1948 US Presidential election, New York Governor Thomas Dewey ran\n",
    "against the incumbent Harry Truman. As usual, a number of polling agencies\n",
    "conducted polls of voters in order to predict which candidate was more likely\n",
    "to win the election."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1936: A Previous Polling Catastrophe\n",
    "\n",
    "In 1936, three elections prior to 1948, the *Literary Digest* infamously\n",
    "predicted a landslide defeat for Franklin Delano Roosevelt. To make this\n",
    "claim, the magazine polled a sample of over 2 million people based on telephone\n",
    "and car registrations. As you may know, this sampling scheme suffers from\n",
    "sampling bias: those with telephones and cars tend to be wealthier than those\n",
    "without. In this case, the sampling bias was so great that the *Literary\n",
    "Digest* thought Roosevelt would only receive 43% of the popular vote when he\n",
    "ended up with 61% of the popular vote, a difference of almost 20% and the\n",
    "largest error ever made by a major poll. The *Literary Digest* went out of\n",
    "business soon after."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1948: The Gallup Poll\n",
    "\n",
    "Determined to learn from past mistakes, the Gallup Poll used a method called\n",
    "*quota sampling* to predict the results of the 1948 election. In their sampling\n",
    "scheme, each interviewer polled a set number of people from each demographic\n",
    "class. For example, the interviews were required to interview both males and\n",
    "females from different ages, ethnicities, and income levels to match the\n",
    "demographics in the US Census. This ensured that the poll would not leave out\n",
    "important subgroups of the voting population. Or so they thought.\n",
    "\n",
    "Using this method, the Gallup Poll predicted that Thomas Dewey would earn 5%\n",
    "more of the popular vote than Harry Truman would. This difference was\n",
    "significant enough that the *Chicago Tribune* famously printed the headline\n",
    "\"Dewey Defeats Truman\":\n",
    "\n",
    "![](Deweytruman12.jpg)\n",
    "\n",
    "As we know now, Truman ended up winning the election. In fact, he won with 5%\n",
    "more of the popular vote than Dewey! What went wrong with the Gallup Poll?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Problem With Quota Sampling\n",
    "\n",
    "Although quota sampling did help pollsters reduce sampling bias, it introduced\n",
    "bias in another way. The Gallup Poll told its interviewers that as long as they\n",
    "fulfilled their quotas they could interview whomever they wished. Here's one\n",
    "possible explanation for why the interviewers ended up polling a\n",
    "disproportionate number of Republicans: at the time, Republicans were on\n",
    "average wealthier and more likely to live in nicer neighborhoods, making them\n",
    "easier to interview. This observation is supported by the fact that the Gallup\n",
    "Poll predicted 2-6% more Republican votes than the actual results for the 3\n",
    "elections prior.\n",
    "\n",
    "These examples highlight the importance of understanding sampling bias as much\n",
    "as possible during the data collection process. Both *Literary Digest* and\n",
    "Gallup Poll made the mistake of assuming their methods were unbiased when\n",
    "their sampling schemes were based on human judgement all along.\n",
    "\n",
    "We now rely on **probability sampling**, a family of sampling methods that\n",
    "assigns precise probabilities to the appearance of each sample, to reduce bias\n",
    "as much as possible in our data collection process."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Big Data?\n",
    "\n",
    "In the age of Big Data, we are tempted to deal with bias by collecting more\n",
    "data. After all, we know that a census will give us perfect estimates;\n",
    "shouldn't a very large sample give almost perfect estimates regardless of the\n",
    "sampling technique?\n",
    "\n",
    "We will return to this question after discussing probability sampling methods\n",
    "to compare the two approaches."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[^1]: https://www.qualtrics.com/blog/the-1936-election-a-polling-catastrophe/"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
